Ungraded Lab: Cats vs. Dogs Class Activation Maps

You will again practice with CAMs in this lab and this time there will only be two classes: Cats and Dogs. You will be revisiting this exercise in this week's programming assignment so it's best if you become familiar with the steps discussed here, particularly in preprocessing the image and building the model.

Imports

In [1]:
import tensorflow_datasets as tfds
import tensorflow as tf

import keras
from keras.models import Sequential,Model
from keras.layers import Dense,Conv2D,Flatten,MaxPooling2D,GlobalAveragePooling2D

import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
import cv2

Download and Prepare the Dataset

We will use the Cats vs Dogs dataset and we can load it via Tensorflow Datasets. The images are labeled 0 for cats and 1 for dogs.

In [2]:
train_data = tfds.load('cats_vs_dogs', split='train[:80%]', as_supervised=True)
validation_data = tfds.load('cats_vs_dogs', split='train[80%:90%]', as_supervised=True)
test_data = tfds.load('cats_vs_dogs', split='train[-10%:]', as_supervised=True)
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/cats_vs_dogs/4.0.0...
Dl Completed...: 0 url [00:00, ? url/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]
Generating train examples...: 0 examples [00:00, ? examples/s]
WARNING:absl:1738 images were corrupted and were skipped
Shuffling /root/tensorflow_datasets/cats_vs_dogs/4.0.0.incompleteWSX9V0/cats_vs_dogs-train.tfrecord*...:   0%|…
Dataset cats_vs_dogs downloaded and prepared to /root/tensorflow_datasets/cats_vs_dogs/4.0.0. Subsequent calls will reuse this data.

The cell below will preprocess the images and create batches before feeding it to our model.

In [3]:
def augment_images(image, label):
  
  # cast to float
  image = tf.cast(image, tf.float32)
  # normalize the pixel values
  image = (image/255)
  # resize to 300 x 300
  image = tf.image.resize(image,(300,300))

  return image, label

# use the utility function above to preprocess the images
augmented_training_data = train_data.map(augment_images)

# shuffle and create batches before training
train_batches = augmented_training_data.shuffle(1024).batch(32)

Build the classifier

This will look familiar to you because it is almost identical to the previous model we built. The key difference is the output is just one unit that is sigmoid activated. This is because we're only dealing with two classes.

In [4]:
model = Sequential()
model.add(Conv2D(16,input_shape=(300,300,3),kernel_size=(3,3),activation='relu',padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(32,kernel_size=(3,3),activation='relu',padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64,kernel_size=(3,3),activation='relu',padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(128,kernel_size=(3,3),activation='relu',padding='same'))
model.add(GlobalAveragePooling2D())
model.add(Dense(1,activation='sigmoid'))

model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 300, 300, 16)      448       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 150, 150, 16)     0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 150, 150, 32)      4640      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 75, 75, 32)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 75, 75, 64)        18496     
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 37, 37, 64)       0         
 2D)                                                             
                                                                 
 conv2d_3 (Conv2D)           (None, 37, 37, 128)       73856     
                                                                 
 global_average_pooling2d (G  (None, 128)              0         
 lobalAveragePooling2D)                                          
                                                                 
 dense (Dense)               (None, 1)                 129       
                                                                 
=================================================================
Total params: 97,569
Trainable params: 97,569
Non-trainable params: 0
_________________________________________________________________

The loss can be adjusted from last time to deal with just two classes. For that, we pick binary_crossentropy.

In [5]:
# Training will take around 30 minutes to complete using a GPU. Time for a break!

model.compile(loss='binary_crossentropy',metrics=['accuracy'],optimizer=tf.keras.optimizers.RMSprop(lr=0.001))
model.fit(train_batches,epochs=25)
WARNING:absl:`lr` is deprecated in Keras optimizer, please use `learning_rate` or use the legacy optimizer, e.g.,tf.keras.optimizers.legacy.RMSprop.
Epoch 1/25
582/582 [==============================] - 77s 108ms/step - loss: 0.6640 - accuracy: 0.5854
Epoch 2/25
582/582 [==============================] - 58s 96ms/step - loss: 0.6263 - accuracy: 0.6433
Epoch 3/25
582/582 [==============================] - 51s 84ms/step - loss: 0.5999 - accuracy: 0.6728
Epoch 4/25
582/582 [==============================] - 50s 83ms/step - loss: 0.5856 - accuracy: 0.6909
Epoch 5/25
582/582 [==============================] - 54s 89ms/step - loss: 0.5757 - accuracy: 0.6989
Epoch 6/25
582/582 [==============================] - 51s 84ms/step - loss: 0.5668 - accuracy: 0.7092
Epoch 7/25
582/582 [==============================] - 57s 93ms/step - loss: 0.5555 - accuracy: 0.7188
Epoch 8/25
582/582 [==============================] - 51s 84ms/step - loss: 0.5478 - accuracy: 0.7254
Epoch 9/25
582/582 [==============================] - 51s 84ms/step - loss: 0.5402 - accuracy: 0.7324
Epoch 10/25
582/582 [==============================] - 51s 85ms/step - loss: 0.5306 - accuracy: 0.7436
Epoch 11/25
582/582 [==============================] - 52s 85ms/step - loss: 0.5203 - accuracy: 0.7478
Epoch 12/25
582/582 [==============================] - 52s 84ms/step - loss: 0.5154 - accuracy: 0.7508
Epoch 13/25
582/582 [==============================] - 51s 84ms/step - loss: 0.5092 - accuracy: 0.7559
Epoch 14/25
582/582 [==============================] - 51s 85ms/step - loss: 0.5015 - accuracy: 0.7589
Epoch 15/25
582/582 [==============================] - 51s 84ms/step - loss: 0.4913 - accuracy: 0.7671
Epoch 16/25
582/582 [==============================] - 51s 85ms/step - loss: 0.4899 - accuracy: 0.7682
Epoch 17/25
582/582 [==============================] - 57s 95ms/step - loss: 0.4793 - accuracy: 0.7777
Epoch 18/25
582/582 [==============================] - 51s 85ms/step - loss: 0.4674 - accuracy: 0.7828
Epoch 19/25
582/582 [==============================] - 50s 83ms/step - loss: 0.4639 - accuracy: 0.7842
Epoch 20/25
582/582 [==============================] - 51s 85ms/step - loss: 0.4562 - accuracy: 0.7883
Epoch 21/25
582/582 [==============================] - 51s 85ms/step - loss: 0.4527 - accuracy: 0.7928
Epoch 22/25
582/582 [==============================] - 51s 84ms/step - loss: 0.4414 - accuracy: 0.8000
Epoch 23/25
582/582 [==============================] - 57s 94ms/step - loss: 0.4336 - accuracy: 0.8046
Epoch 24/25
582/582 [==============================] - 51s 84ms/step - loss: 0.4304 - accuracy: 0.8086
Epoch 25/25
582/582 [==============================] - 51s 84ms/step - loss: 0.4181 - accuracy: 0.8121
Out[5]:
<keras.callbacks.History at 0x7fdc9b5c1ff0>

Building the CAM model

You will follow the same steps as before in generating the class activation maps.

In [6]:
gap_weights = model.layers[-1].get_weights()[0]
gap_weights.shape

cam_model  = Model(inputs=model.input,outputs=(model.layers[-3].output,model.layers[-1].output))
cam_model.summary()
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_input (InputLayer)   [(None, 300, 300, 3)]     0         
                                                                 
 conv2d (Conv2D)             (None, 300, 300, 16)      448       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 150, 150, 16)     0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 150, 150, 32)      4640      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 75, 75, 32)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 75, 75, 64)        18496     
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 37, 37, 64)       0         
 2D)                                                             
                                                                 
 conv2d_3 (Conv2D)           (None, 37, 37, 128)       73856     
                                                                 
 global_average_pooling2d (G  (None, 128)              0         
 lobalAveragePooling2D)                                          
                                                                 
 dense (Dense)               (None, 1)                 129       
                                                                 
=================================================================
Total params: 97,569
Trainable params: 97,569
Non-trainable params: 0
_________________________________________________________________
In [7]:
def show_cam(image_value, features, results):
  '''
  Displays the class activation map of an image

  Args:
    image_value (tensor) -- preprocessed input image with size 300 x 300
    features (array) -- features of the image, shape (1, 37, 37, 128)
    results (array) -- output of the sigmoid layer
  '''

  # there is only one image in the batch so we index at `0`
  features_for_img = features[0]
  prediction = results[0]

  # there is only one unit in the output so we get the weights connected to it
  class_activation_weights = gap_weights[:,0]

  # upsample to the image size
  class_activation_features = sp.ndimage.zoom(features_for_img, (300/37, 300/37, 1), order=2)
  
  # compute the intensity of each feature in the CAM
  cam_output  = np.dot(class_activation_features,class_activation_weights)

  # visualize the results
  print(f'sigmoid output: {results}')
  print(f"prediction: {'dog' if round(results[0][0]) else 'cat'}")
  plt.figure(figsize=(8,8))
  plt.imshow(cam_output, cmap='jet', alpha=0.5)
  plt.imshow(tf.squeeze(image_value), alpha=0.5)
  plt.show()

Testing the Model

Let's download a few images and see how the class activation maps look like.

In [8]:
!wget -O cat1.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat1.jpeg
!wget -O cat2.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat2.jpeg
!wget -O catanddog.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/catanddog.jpeg
!wget -O dog1.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/dog1.jpeg
!wget -O dog2.jpg https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/dog2.jpeg
--2023-04-28 21:05:35--  https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat1.jpeg
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.125.128, 142.250.136.128, 142.250.148.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.125.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 414826 (405K) [image/jpeg]
Saving to: ‘cat1.jpg’

cat1.jpg            100%[===================>] 405.10K  --.-KB/s    in 0.003s  

2023-04-28 21:05:35 (117 MB/s) - ‘cat1.jpg’ saved [414826/414826]

--2023-04-28 21:05:35--  https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/cat2.jpeg
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.125.128, 142.250.136.128, 142.250.148.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.125.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 599639 (586K) [image/jpeg]
Saving to: ‘cat2.jpg’

cat2.jpg            100%[===================>] 585.58K  --.-KB/s    in 0.006s  

2023-04-28 21:05:35 (97.9 MB/s) - ‘cat2.jpg’ saved [599639/599639]

--2023-04-28 21:05:35--  https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/catanddog.jpeg
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.125.128, 142.250.136.128, 142.250.148.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.125.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 561943 (549K) [image/jpeg]
Saving to: ‘catanddog.jpg’

catanddog.jpg       100%[===================>] 548.77K  --.-KB/s    in 0.004s  

2023-04-28 21:05:35 (131 MB/s) - ‘catanddog.jpg’ saved [561943/561943]

--2023-04-28 21:05:35--  https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/dog1.jpeg
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.125.128, 142.250.136.128, 142.250.148.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.125.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 338769 (331K) [image/jpeg]
Saving to: ‘dog1.jpg’

dog1.jpg            100%[===================>] 330.83K  --.-KB/s    in 0.003s  

2023-04-28 21:05:36 (116 MB/s) - ‘dog1.jpg’ saved [338769/338769]

--2023-04-28 21:05:36--  https://storage.googleapis.com/tensorflow-1-public/tensorflow-3-temp/MLColabImages/dog2.jpeg
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.125.128, 142.250.136.128, 142.250.148.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.125.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 494803 (483K) [image/jpeg]
Saving to: ‘dog2.jpg’

dog2.jpg            100%[===================>] 483.21K  --.-KB/s    in 0.004s  

2023-04-28 21:05:36 (119 MB/s) - ‘dog2.jpg’ saved [494803/494803]

In [9]:
# utility function to preprocess an image and show the CAM
def convert_and_classify(image):

  # load the image
  img = cv2.imread(image)

  # preprocess the image before feeding it to the model
  img = cv2.resize(img, (300,300)) / 255.0

  # add a batch dimension because the model expects it
  tensor_image = np.expand_dims(img, axis=0)

  # get the features and prediction
  features,results = cam_model.predict(tensor_image)
  
  # generate the CAM
  show_cam(tensor_image, features, results)

convert_and_classify('cat1.jpg')
convert_and_classify('cat2.jpg')
convert_and_classify('catanddog.jpg')
convert_and_classify('dog1.jpg')
convert_and_classify('dog2.jpg')
1/1 [==============================] - 0s 478ms/step
sigmoid output: [[0.1287439]]
prediction: cat
1/1 [==============================] - 0s 42ms/step
sigmoid output: [[0.905586]]
prediction: dog
1/1 [==============================] - 0s 23ms/step
sigmoid output: [[0.6290183]]
prediction: dog
1/1 [==============================] - 0s 22ms/step
sigmoid output: [[0.5264424]]
prediction: dog
1/1 [==============================] - 0s 21ms/step
sigmoid output: [[0.85430413]]
prediction: dog

Let's also try it with some of the test images before we make some observations.

In [10]:
# preprocess the test images
augmented_test_data = test_data.map(augment_images)
test_batches = augmented_test_data.batch(1)


for img, lbl in test_batches.take(5):
  print(f"ground truth: {'dog' if lbl else 'cat'}")
  features,results = cam_model.predict(img)
  show_cam(img, features, results)
ground truth: cat
1/1 [==============================] - 0s 61ms/step
sigmoid output: [[0.626493]]
prediction: dog
ground truth: dog
1/1 [==============================] - 0s 24ms/step
sigmoid output: [[0.33213592]]
prediction: cat
ground truth: dog
1/1 [==============================] - 0s 47ms/step
sigmoid output: [[0.6680481]]
prediction: dog
ground truth: cat
1/1 [==============================] - 0s 37ms/step
sigmoid output: [[0.0925537]]
prediction: cat
ground truth: cat
1/1 [==============================] - 0s 34ms/step
sigmoid output: [[0.5735391]]
prediction: dog

If your training reached 80% accuracy, you may notice from the images above that the presence of eyes and nose play a big part in determining a dog, while whiskers and a colar mostly point to a cat. Some can be misclassified based on the presence or absence of these features. This tells us that the model is not yet performing optimally and we need to tweak our process (e.g. add more data, train longer, use a different model, etc).